Handling staleelement reference exception Coverfox website using Python pytest selenium - python-3.x

content of test_homepage.py
def test_insurance_pages_open_successfully_using_fixtures(page_object, load_home_page, insurance_data):
page_object.open_insurance(insurance_data)
assert page_object.ui.contains_text('Buying two-wheeler insurance from Coverfox is simple')
open_insurance function in page object home_page.py
def open_insurance(self, insurance):
self._ui.move_to(locators.drp_dwn_insurance)
self._ui.click(format_locator(locators.lnk_insurance, insurance))
move_to function in another file.py
def move_to(self, locator):
to_element = self.find_element(locator)
print("element value", to_element)
self.action.move_to_element(to_element).perform()
What I am trying to over here is
test_insurance_pages_open_successfully_using_fixtures takes 3 fixtures as arguments 1.
page_object that provides a page object at a session-level 2.
load_home_page to load the home page again at session-level 3.
insurance_data fixture in conftest.py which suppliers list of link texts read from some CSV file
So, in essence, it will load the page and open all links one by one for website - https://www.coverfox.com/
First test case passes for link Two-wheelers insurance but for 2nd test data run it fails giving an exception of stale element reference on the point where it is trying to move to(move_to function) insurance link again.
I am not storing elements anywhere and function is written in a way that it will find the element again.
What is causing this? Or Pytest does some sort of element caching in the background

It seems that you should use function-level fixture for load_home_page or refresh the page after you have done some actions.
In the current approach (at least how you described it) you are using the same page and page state for different tests.
Could you please share the fixtures code as well?

Related

Is an object-oriented approach the right solution for my task? If so, rough idea on how to implement it?

Currently attempting to develop something for my work that could be used by others. I know that the object-oriented approach is considered important so I'm trying to envision how I would use that for what I'm trying to do, but failing to see how.
I'm writing a web scraper in python using Selenium. There are tables of data that can be accessed for different clients and I would like to allow future users of my program to pull one (or maybe multiple) tables to see the data, or to use it to validate that things are properly populating on the website.
The code is still a work in progress, and I'm attempting to learn best practices and the right way to do things. Any other feedback is welcomed, I want to learn.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
import getpass
import bs4
import time
###Open a headless Chrome, grab the URL page, enter into username and password, POST
chop = Options()
driver = input().strip()
if driver == "y":
chop.add_argument("--headless")
driver = webdriver.Chrome(r"<Path to Chromedriver.exe placeholder>", options=chop)
driver.get("<URL placeholder>")
element = driver.find_element_by_name("email")
element.clear()
element.send_keys(input("Username: ").strip())
element = driver.find_element_by_name("password")
element.clear()
element.send_keys(getpass.getpass().strip())
element.send_keys(Keys.ENTER)
###Select a client, navigate to a table
select = Select(driver.find_element_by_id("<ID placeholder>"))
select.select_by_value(input("Client #:").strip())
element = driver.find_element_by_id("<ID placeholder>")
element.click()
element = driver.find_element_by_id("<ID placeholder>")
element.click()
###Attempt to grab the entire table, print it out in terminal
###1 second sleep semi-necessary to give page time to load table
time.sleep(1)
element = driver.find_elements(By.TAG_NAME, "tr")
###Print out grabbed rows
for L in element[2:-2]:
print(L.text)
driver.quit()
So far it just lets you navigate to one of the tables and grabs all the rows. Some rows aren't really valid so that's why I'm slicing. The placeholders are just to hide what I'm actually working on, don't think the company would like a bunch of random people finding the website.
selenium is already implemented using an object oriented approach. OOP is designed to help reduce the stress of trying to expand projects with extra requirements, or in your case, lets say I wanted to scrape an additional website, you'd have to write a unique script for each website. One of the ways I abstract selenium is by creating a controller that manages all of the overhead for selenium and then is initialized with a list of parsers that are responsible for scraping individual websites and returning their results. This is probably better explained with code:
class WebParser:
def parse(browser: webdriver.Firefox):
# A unique script for handling this particular webpage
...
class SeleniumController:
def __init__(parsers: list[WebParser]):
self.parsers = parsers
self.browser = webdriver.FireFox()
# Finish initializing browser either here, or in a separate function
...
def trigger():
result = []
for p in self.parsers:
result.append(p.parse(self.browser))
# Handle your results
...
You can then create a child class of WebParser for each different webpage that you need to parse.
This kind of problem might also benefit form using a adapter pattern or some sort of abstract factory to aid in creating web parsers at runtime. refactoring.guru is an amazing site for applying object oriented concepts to different kinds of problems

How to open "partial" links using Python?

I'm working on a webscraper that opens a webpage, and prints any links within that webpage if the link contains a keyword (I will later open these links for further scraping).
For example, I am using the requests module to open "cnn.com", and then trying to parse out all href/links within that webpage. Then, if any of the links contain a specific word (such as "china"), Python should print that link.
I could just simply open the main page using requests, save all href's onto a list ('links'), and then use:
links = [...]
keyword = "china"
for link in links:
if keyword in link:
print(link)
However, the problem with this method is that the links that I originally parsed out aren't full links. For example, all links with CNBC's webpage are structured like this:
href="https://www.cnbc.com/2019/08/11/how-recession-affects-tech-industry.html"
But for CNN's page, they're written like this (not full links... they're missing the part that comes before the "/"):
href="/2019/08/10/europe/luxembourg-france-amsterdam-tornado-intl/index.html"
This is a problem because I'm writing more script to automatically open these links to parse them. But Python can't open
"/2019/08/10/europe/luxembourg-france-amsterdam-tornado-intl/index.html"
because it isn't a full link.
So, what is a robust solution to this (something that works for other sites too, not just CNN)?
EDIT: I know the links I wrote as an example in this post don't contain the word "China", but this these are just examples.
Try using the urljoin function from the urllib.parse package. It takes two parameters, the first is the URL of the page you're currently parsing, which serves as the base for relative links, the second is the link you found. If the link you found starts with http:// or https://, it'll return just that link, else it will resolve URL relative to what you passed as the first parameter.
So for example:
#!/usr/bin/env python3
from urllib.parse import urljoin
print(
urljoin(
"https://www.cnbc.com/",
"/2019/08/10/europe/luxembourg-france-amsterdam-tornado-intl/index.html"
)
)
# prints "https://www.cnbc.com/2019/08/10/europe/luxembourg-france-amsterdam-tornado-intl/index.html"
print(
urljoin(
"https://www.cnbc.com/",
"http://some-other.website/"
)
)
# prints "http://some-other.website/"

Scrapy: Running one spider, then using information gathered to run another spider

In the Scrapy docs, the example they give for running multiple spiders is something like this:
process = CrawlerProcess()
process.crawl(MySpider1)
process.crawl(MySpider2)
process.start()
However, the problem is that I want to run Spider1, parse the data, and then use the extracted data to run Spider2. If I do something like:
process.crawl(MySpider1)
process.start()
parse_data_from_spider1()
pass_data_to_spider2_class()
process2.crawl(MySpider2)
process2.start()
It gives me the dreaded ReactorNotRestartable error. Could someone guide me on how to do what I'm trying to achieve here?
The code you're using from the docs runs multiple spiders in the same process using the internal API, so that's an issue if you need to wait for the first spider to finish before starting the second.
If this is the entire scope of the issue, my suggestions would be to store the data from the the first spider a place where the second one can consume it (database, csv, jsonlines), and bring that data into the second spider run, either in the spider definition (where name is defined, or if you've got subclasses of scrapy.Spider, maybe in the __init__) or in the start_requests() method.
Then you'll have to run the spiders sequentially, you can see the CrawlerRunner() chaining deferred method in the common practices section of the docs.
configure_logging()
runner = CrawlerRunner()
#defer.inlineCallbacks
def crawl():
yield runner.crawl(MySpider1)
yield runner.crawl(MySpider2)
reactor.stop()
crawl()
reactor.run()

How do i make all rows of pytest-html single html report result file default to a collapsed state?

I use the pytest runner to get the output of results from my automated test frameworks (Selenium and RestAPI tests). I use the pytest-html plugin to generate a single page html result file at the end of the test. I initiate the test run with the following command (from an active VirtEnv session).
python -m pytest -v --html="api_name_test.html" --self-contained-html
(it's a little more complicated in that I use a powershell script to run this and provide a datetime stamped result file name and email the file when it's finished but it's essentially the above command)
When the reports are generated and I open this report html I find that all the non passing tests are expanded. I want to make it so all rows are collapsed by default (Failed, XFailed, Error etc..).
My project contains a conftest.py file at the diretory root and a pytest.ini file where I specify the directory for the test scripts
In the conftest.py file in my simplest projects, I have one optional hook to obtain the target url of the tests and put that in the report summary:
import pytest
from py._xmlgen import html
import os
import rootdir_ref
import simplejson
#pytest.mark.optionalhook
def pytest_html_results_summary(prefix):
theRootDir = os.path.dirname(rootdir_ref.__file__)
credentials_path = os.path.join(theRootDir, 'TestDataFiles', 'API_Credentials.txt')
target_url = simplejson.load(open(credentials_path)).get('base_url')
prefix.extend([html.p("Testing against URL: " + target_url)])
The Github page mentions that a display query can be used to collapse rows with various results, but it doesn't mention where this information is entered.
https://github.com/pytest-dev/pytest-html
"By default, all rows in the Results table will be expanded except those that have Passed. This behavior can be customized with a query parameter: ?collapsed=Passed,XFailed,Skipped"
Currently I'm unsure if the ?collapsed=... line goes in the command line, or the conftest as a hook, or do I need to edit the default style.css or main.js that comes with the pytest-html plugin? (Also i'm not familiar with css and only know a small amount of html). I'm assuming it goes in the conftest.py file as a hook but don't really understand how to apply it.
https://pytest-html.readthedocs.io/en/latest/user_guide.html#display-options
Auto Collapsing Table Rows
By default, all rows in the Results table will be expanded except those that have Passed.
This behavior can be customized either with a query parameter: ?collapsed=Passed,XFailed,Skipped or by setting the render_collapsed in a configuration file (pytest.ini, setup.cfg, etc).
[pytest]
render_collapsed = True
NOTE: Setting render_collapsed will, unlike the query parameter, affect all statuses.

Where do I place the validation exception code in my pyramid app?

I have a model file in my pyramid app, and inside of that model file, I am doing automatic validation before an insert using formencode. A failed validation inside of my model file raises a formencode.Invalid exception.
I found the following documentation on how to set up a custom exception view, but I am unclear on a couple of things:
Where do I put the actual exception view code? This is clearly view code, so it should be in a view somewhere. Should it be in its own view file? I've pasted the code I need to place at the bottom.
How do I make the rest of my pyramid app aware of this code? The only obvious way that I see is to import the view file inside of my model files, but that gives me a bad taste in my mouth. I'm sure there must be another way to do it, but I'm not sure what that is.
Code to place:
from pyramid.view import view_config
from helloworld.exceptions import ValidationFailure
#view_config(context=ValidationFailure)
def failed_validation(exc, request):
response = Response('Failed validation: %s' % exc.msg)
response.status_int = 500
return response
1) Anywhere in your project directory. I made a new file called exceptions.py where I place all my HTTP status code and validation exceptions. I placed this file in the same directory as my views.py, models.py, etc.
2) That bad taste in your mouth is Python, because importing methods is the Pythonic way to go about using classes and functions in other files, rather than some sort of magic. Might be weird at first, but you'll quickly get used to it. Promise.
I want to note that in your models.py file, you're only going to be importing ValidationFailure from helloworld.exception and raising ValidationFailure wherever you want. You aren't importing the whole view function you've defined (failed_validation). That's why the context for that view function is ValidationFailure, so it knows to go there when you simply raise ValidationFailure

Resources