Selenium don't read all the content of a webpage - python-3.x

I'm newbie with selenium. I have defined a batch process where I process several pages. These pages are all equals but with different data. Therefore, I process all these pages with the same code. When I start the process this works fine, but, from time to time, I always get the same error in the same point. I have found that when I try to get the data from the web page, there are two tables that I can not get their code when the process doesn't work fine. And I don't understand anything because if I restart the process in the same page that has failed previously then it works fine!!! So, it seems that selenium not always load the content of the web page correctly.
I use python 3 and selenium with these code:
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "normal"
#b = WebDriver(executable_path="./chromedriver")
b = webdriver.Chrome(desired_capabilities=caps, executable_path="./chromedriver")
b.set_window_size(300, 300)
b.get(url)
html = b.page_source
self.setSoup(bs4.BeautifulSoup(html,'html.parser'))
b.close()
How can I avoid to get always this error which is produced from time to time?
Edit I:
I have checked that when the process works fine this sentence returns me two tables:
tables = self.get_soup().findAll("table", class_="competitor-table comparative responsive")
But, when the process works wrong, this code returns me 0 tables. How I said before, If I process again the web page that previously has gave me the error, then works fine, therefore, it returns me two tables instead of zero.
For this reason, I supose that selenium is not returns me always the code of the page, because for the same page when it works wrong returns me zero tables and when it works fine returns me two tables.
Edit II:
For example, right now I've got an error in this page:
http://www.fiba.basketball/euroleaguewomen/18-19/game/1912/Olympiacos-Perfumerias-Avenida#|tab=boxscore
The tables that I try to retrieve and I don't get them are these:
How you can see, there is two tables with this CSS class. I don't post the content of the tables because are so big.
This is the code where I try to get the content of the tables:
def set_boxscore(self):
tables = self.get_soup().findAll("table", class_="competitor-table comparative responsive")
local = False
print("Total tablas: {}".format(len(tables)))
for t in tables:
local = not local
if local:
self.home_team.set_stats(t.tfoot.find("tr", class_="team-totals"))
else:
self.away_team.set_stats(t.tfoot.find("tr", class_="team-totals"))
rows = t.tbody.findAll("tr")
for row in rows:
time = row.find("td", class_="min").string
if time.find(MESSAGES.MESSAGE_PLAYER_NOT_PLAY) == -1:
if local:
player = PlayerEuropa(row)
self.home_players.append(player)
else:
player = PlayerEuropa(row)
self.away_players.append(player)
In this code I write the total tables that I can find in them and how you can see, right now, I've got zero tables:
And, now, If I restart the process, then, it will work correctly for me.
Edit III:
One example more, about the process that I have defined. These url's has been processed correctly.
http://www.fiba.basketball/eurocupwomen/18-19/game/2510/VBW-CEKK-Ceglèd-Rutronik-Stars-Keltern#|tab=boxscore
http://www.fiba.basketball/eurocupwomen/18-19/game/2510/Elfic-Fribourg-Tarbes-GB#|tab=boxscore
http://www.fiba.basketball/eurocupwomen/18-19/game/2510/Basket-Landes-BBC-Sint-Katelijne-Waver#|tab=boxscore
But when I have tried to process this other url, then, I've got the error explained previously:
http://www.fiba.basketball/eurocupwomen/18-19/game/0111/Gorzow-Sparta-k-M-R--Vidnoje#|tab=boxscore
To render the web page I use selenium and I always do it at the beginning of the process. I get the content of the web page with this code:
def __init__(self, url):
"""Constructor"""
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "normal"
#b = WebDriver(executable_path="./chromedriver")
b = webdriver.Chrome(desired_capabilities=caps, executable_path="./chromedriver")
b.set_window_size(300, 300)
b.get(url)
html = b.page_source
self.setSoup(bs4.BeautifulSoup(html,'html.parser'))
b.close()
After this code is when I start to retrieve the information of the web page. For some reason, sometimes, the web page is not rendered completely because when I try to get some information, this is not found it and get the error explained previously.

Related

understanding Python Nextion Display - Page Change

tl:dr
Looking for a way to change the active page in python3 using the Nextion library,
I have tried (x = 1, x = 'page1', x = 'page 1') and a many other iterations.
client = Nextion('/dev/ttyS0', 9600, event_handler)
await client.connect()
await client.set('page', x)
Hi everyone, I am making a Nextion display to attach to the outside of a raspberry pi to display some operational values such as if a serial port is connected, gps location data, cpu operating temp etc.
The logic for collection and display of the data is all sorted but I am having issues with the basics of the Nextion library and how to do what seems like a simple thing, change the active page.
Ok so I finally worked it out,
It looks like the example I used had the await client.set(x,y) for the text fields but nothing for the page, it turns out after reading the library file for 100th time I noticed a function called write_command
I tried this
client.write_command(next_page)
And it worked

What is making the webbrowser close before it finishes?

I have the code bellow which I know has worked before but for some reason seems to be broken now. The code is mean't to open a search engine, search for a query and return a list of results by the href tag. The webbrowser will open and navigate to http://www.startpage.com success fully, it then puts the term I have entered at the bottom into the search box but then just closes the browser. No error, no links. Nothing.
import selenium.webdriver as webdriver
def get_results(search_term):
url = "https://www.startpage.com"
browser = webdriver.Firefox()
browser.get(url)
search_box = browser.find_element_by_id("query")
search_box.send_keys(search_term)
search_box.submit()
try:
links = browser.find_elements_by_xpath("//ol[#class='web_regular_results']//h3//a")
except:
links = browser.find_elements_by_xpath("//h3//a")
results = []
for link in links:
href = link.get_attribute("href")
print(href)
results.append(href)
browser.close()
return results
get_results("dog")
Does anyone know what is wrong with this? Basically it gets to search_box.submit() then skips everything until browser.close().
Unlike find_element_by_xpath (single WebElement) If find_elements_by_xpath won't find any results it won't throw an exception, it will return an empty list. links is empty so the for loop is never executed. You can change the try except to if condition, and check if it has values
links = browser.find_elements_by_xpath("//ol[#class='web_regular_results']//h3//a")
if not links:
links = browser.find_elements_by_xpath("//h3//a")
It is not recommended to use browser close function within the function that you are testing. Instead you can use after the get_results("dog") function and keep the test logic away.
get_results("dog")
browser.close()
By doing this way selenium will complete the execution of the function first and then close the browser window.
The problem with your solution is that the method is returning the result set after the browser is closing the window due to which you are facing logical problem with your script.

Random failure of selenium test on test server

I'm working on a project which uses nodejs and nighwatch for test automation. The problem here is that the tests are not reliable and give lots of false positives. I did everything to make them stable and still getting the errors. I went through some blogs like https://bocoup.com/blog/a-day-at-the-races and did some code refactoring. Did anyone have some suggestions to solve this issue. At this moment I have two options, either I rewrite the code in Java(removing nodejs and nightwatch from solution as I'm far more comfortable in Java then Javascript. Most of the time, struggle with the non blocking nature of Javascript) or taking snapshots/reviewing app logs/run one test at a time.
Test environment :-
Server -Linux
Display - Framebuffer
Total VM's -9 with selenium nodes running the tests in parallel.
Browser - Chrome
Type of errors which I get is element not found. Most of the time the tests fail as soon the page is loaded. I have already set 80 seconds for timeout so time can't be issue. The tests are running in parallel but on separate VM's so I don't know whether it can be issue or not.
Edit 1: -
Was working on this to know the root cause. I did following things to eliminate random fails: -
a. Added --suiteRetries to retry the failed cases.
b. Went through the error screenshot and DOM source. Everything seems fine.
c. Replaced the browser.pause with explicit waits
Also while debugging I observed one problem, maybe that is the issue which is causing random failures. Here's the code snippet
for (var i = 0; i < apiResponse.data.length; i++) {
var name = apiResponse.data[i];
browser.useXpath().waitForElementVisible(pageObject.getDynamicElement("#topicTextLabel", name.trim()), 5000, false);
browser.useCss().assert.containsText(
pageObject.getDynamicElement("#topicText", i + 1),
name.trim(),
util.format(issueCats.WRONG_DATA)
);
}
I added the xpath check to validate if i'm waiting enough for that text to appear. I observed that visible assertion is getting passed but in next assertion the #topicText is coming as previous value or null.This is an intermittent issue but on test server happens frequently.
There is no magic bullet to brittle UI end to end tests. In the ideal world there would be an option set avoid_random_failures=true that would quickly and easily solve the problem, but for now it's only a dream.
Simple rewriting all tests in Java will not solve the problem, but if you feel better in java, then I would definitely go in that direction.
As you already know from this article Avoiding random failures in Selenium UI tests there are 3 commonly used avoidance techniques for race conditions in UI tests:
using constant sleep
using WebDriver's "implicit wait" parameter
using explicit waits (WebDriverWait + ExpectedConditions + FluentWait)
These techniques are also briefly mentioned on WebDriver: Advanced Usage, you can also read about them here: Tips to Avoid Brittle UI Tests
Methods 1 and 2 are generally not recommended, they have drawbaks, they can work well on simple HTML pages, but they are not 100% realiable on AJAX pages, and they slow down the tests. The best one is #3 - explicit waits.
In order to use technique #3 (explicit waits) You need to familiarize yourself and be comfortable with the following WebDriver tools (I point to theirs java versions, but they have their counterparts in other languages):
WebDriverWait class
ExpectedConditions class
FluentWait - used very rarely, but very usefull in some difficult cases
ExpectedConditions has many predefinied wait states, the most used (in my experience) is ExpectedConditions#elementToBeClickable which waits until an element is visible and enabled such that you can click it.
How to use it - an example: say you open a page with a form which contains several fields to which you want to enter data. Usually it is enought to wait until the first field appears on the page and it will be editable (clickable):
By field1 = By.xpath("//div//input[.......]");
By field2 = By.id("some_id");
By field3 = By.name("some_name");
By buttonOk = By.xpath("//input[ text() = 'OK' ]");
....
....
WebDriwerWait wait = new WebDriverWait( driver, 60 ); // wait max 60 seconds
// wait max 60 seconds until element is visible and enabled such that you can click it
// if you can click it, that means it is editable
wait.until( ExpectedConditions.elementToBeClickable( field1 ) ).sendKeys("some data" );
driver.findElement( field2 ).sendKeys( "other data" );
driver.findElement( field3 ).sendKeys( "name" );
....
wait.until( ExpectedConditions.elementToBeClickable( buttonOK)).click();
The above code waits until field1 becomes editable after the page is loaded and rendered - but no longer, exactly as long as it is neccesarry. If the element will not be visible and editable after 60 seconds, then test will fail with TimeoutException.
Usually it's only necessary to wait for the first field on the page, if it becomes active, then the others also will be.

nodejs vs. ruby / understanding requests processing order

I have a simple utility that i use to size image on the fly via url params.
Having some troubles with the ruby image libraries (cmyk to rvb is, how to say… "unavailable"), i gave it a shot via nodejs, which solved the issue.
Basically, if the image does not exists, node or ruby transforms it. Otherwise when the image has already been requested/transformed, the ruby or node processes aren't touched, the image is returned statically
The ruby works perfectly, a bit slow if lot of transforms are requested at once, but very stable, it always go through whatever the amount (i see the images arriving one the page one after another)
With node, it works also perfectly, but when a large amount of images are requested, for a single page load, the first images is transformed, then all the others requests returns the very same image (the last transformed one). If I refresh the page, the first images (already transformed) is returned right away, the second one is returned correctly transformed, but then all the other images returned are the same as the one just newly transformed. and it goes on the same for every refresh. not optimal , basically the resquests are "merged" at some point and all return the same image. for reason i don't understand
(When using 'large amount', i mean more than 1)
The ruby version :
get "/:commands/*" do |commands,remote_path|
path = "./public/#{commands}/#{remote_path}"
root_domain = request.host.split(/\./).last(2).join(".")
url = "https://storage.googleapis.com/thebucket/store/#{remote_path}"
img = Dragonfly.app.fetch_url(url)
resized_img = img.thumb(commands).to_response(env)
return resized_img
end
The node js version :
app.get('/:transform/:id', function(req,res,next){
parser.parse(req.params,function(resized_img){
// the transform are done via lovell/sharp
// parser.parse parse the params, write the file,
// return the file path
// then :
fs.readFileSync(resized_img, function(error,data){
res.write(data)
res.end()
})
})
})
Feels like I'm missing here a crucial point in node. I expected the same behaviour with node and ruby, but obviously the same pattern transposed in the node area just does not work as expected. Node is not waiting for a request to process, rather processes those somehow in an order that is not clear to me
I also understand that i'm not putting the right words to describe the issue, hoping that it might speak to some experienced users, let them provide clarifiactions to get a better understanding of what happens behind the node scenes

Does capybara "within" deal with ajax, why failed within the css with the error " unable to find find css "

I have a step like this:
Then(/^I can see the Eligible Bugs list "([^"]*)"$/) do |bugs|
bugs_list = bugs.split(", ")
assert page.has_css?(".eligible-bugs", :visible => true)
within(".eligible-bugs") do
bugs_list.each do |bug|
assert page.has_content?(bug)
end
end
end
But the step fail sometimes at the " within(".eligible-bugs") do" with the error 'Unable to find css ".eligible-bugs"'
I feel it is odd.
for the assertion has been passed. it means the css is visible.
why within cannot find css? How it happen.
But the step fail sometimes at the " within(".eligible-bugs") do" with the error 'Unable to find css ".eligible-bugs"'
I feel it is odd.
for the assertion has been passed. it means the css is visible.
why within cannot find css? How it happen.
BTW, I have set my max wait time to 5.
Capybara.default_max_wait_time = 5
The only way that should happen, is if the page is dynamically changing while you're running the test - sometime during your check for all bugs content on the page it is changing and the '.eligible-bugs' element is going way. The test and the browser run separately, so how/why it is happening depends on what else your page is doing in the browser, it would also depend on what steps have come before this in the test.
Also, note that it's not necessarily disappearing between the has_css? and the within statement first running. If it disappears at any point during the code inside the within running it could throw the same error as it attempts to reload the '.eligible-bugs' element.
From the title of the question I assume the list that you want to check is the result of a search or filtering action. If it is, does that action remove the existing '.eligible-bugs' element and then after some time replace it with a new one returned from an ajax request or something? If that is the case then, since you control the test data, you should be waiting for the correct results count to show, thereby ensuring any element replacements have completed, before checking for the text. How you do that would depend on the exact makeup of the page, but if each eligible bug was a child of '.eligible-bugs' and had a class of '.eligible-bug' I would write your test something like
Then(/^I can see the Eligible Bugs list "([^"]*)"$/) do |bugs|
bugs_list = bugs.split(", ")
assert_css(".eligible-bugs > .eligible_bug", count: bugs_list.size) # wait until the expected number of results have shown up
within(".eligible-bugs") do
bugs_list.each do |bug|
assert_content(bug)
end
end
end

Resources