Browser waits longer than expected - python-3.x

I have some code, in which I'm trying to scrape a website. After a while, I think I'm being slowed down by the site. I can't chekc that, but this is happening within my code
z=timeit.default_timer()
try:
WebDriverWait(browser,5).until(
EC.presence_of_element_located((By.XPATH,'''
.//div[#collectionitem="title"]/descendant::div[#class="titleWidgetLayout"]/
descendant::h1[#class="title"]''')))
except:
print('Web Scraper not loaded')
return 'Error Load'
n=timeit.default_timer()
print('Time actually waited',n-z)
I find that while at the beginning this time was about 1-2 seconds, it ends up turning into a 25 seconds wait. This not only slows down the code more than what is acceptable, how can the time waited be far longer than the 5 seconds I set up as timeout error trigger?
I guess this might be a blockage from the page, but in any case, how can I fix this?

This is most likely occurring because the webdriver is set to wait until the page is completely (until the load symbol of the page is gone) done loading before looking for any elements.
So, if you page takes 23 seconds to load, and then your element is located after 3 seconds after page load, you would not throw a timeout from your WebDriverWait condition.
You can try to set the Page Load Timeout with this:
browser.set_page_load_timeout(5)
This way, if the page is taking too long to load, you can skip it? Other than that, you will have to wait until the page load is complete.

Related

What is the best way to run a background process in a Dash app?

I have a Dash application that queries an API, based on a user search query, performs some calculations on the response, then displays the final results to the user on a Dash app. In order to provide a quick response to the user, I am trying to set up a quick result callback and a full result long_callback.
The quick result will grab limited results from the API and display results to the user within 10-15 seconds, while the full results will run in the background, collecting all results (which can take up to 2 minutes), then updates the page with the full results when they are available.
I am curious what the best way to perform this action is, as I have run into forking issues with my current attempt.
My current attempt: Using the diskcache.Cache() as the DiskcacheLongCallbackManager and a database txt file to store availability of results.
I have a database txt file that stores a dictionary, with the keys being the search query and the fields being quick_results: bool, full_results: bool, file_path: str, timestamp: dt (as str).
When a search query is entered and submit is pressed, a callback loads the database file as a variable and then checks the dictionary keys for the presence of this search query.
If it finds the query in the keys of the database, it loads the saved feather file from the provided file_path and returns it to the dash app for generation of the page content.
If it does not find the query in the database keys, it requests limited data from the API, runs calculations, saves the DataFrame as a feather file on disk, then creates an entry in the database with the search query(as the key), the file path of the saved feather file, the current timestamp, and sets the quick_results value to True.
It then loads this feather file from the file_path created and returns it to the dash app for generation of the page content.
A long_callback is triggered at the same time as the above callback, with a 20 second sleep to prevent overlap with the quick search. This callback also loads the database file as a variable and checks if the query is present in the database keys.
If found, it then checks if the full results value is True and if the timestamp is more than 0 days old.
If the full results are unavailable or are more than 0 days old, the long_callback requests full results from the API, performs the same calculations, then updates the already existing search query in the database, making the full_results True and the timestamp the time of completion for the full search.
It then loads the feather file from the file_path and returns it to the dash app for generation of the page content.
If the results are available and less then 1 day old, the long callback simply loads the feather file from the provided file_path and returns it to the dash app for generation of the page content.
The problem I am currently facing is that I am getting a weird forking error on the long callback on only one of the conditions for a full search. I currently have the long_callback setup to perform a full search only if the full results flag is False or the results are more than 0 days old. When the full_results flag is False, the callback runs as expected, updates the database and returns the full results. However, when the results are available but more than 0 days old, the callback hits a forking error and is unable to complete.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec(). Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
I am at a loss as to why the function would run without error on one of the conditions, but then have a forking error on the other condition. The process that runs after both conditions is exactly the same.
By using print statements, I have noticed that this forking error triggers when the function tries to call the requests.get() function on the API.
If this issue is related to how I have setup the background process functionality, I would greatly appreciate some suggestions or assitance on how to do this properly, where I will not face this forking error.
If there is any information I have left out that will be helpful, please let me know and I will try to provide it.
Thank you for any help you can provide.

How to handle a Time Out?

I am running an Excel power query in a loop. The query runs.
For a reason related to the internet (I am not in a fiber covered area) the query fails to load the data, returning a time out error.
Given that is possible the entire loop cycle has not completed, I would like to stop the refresh before the error pops up and resume with the code despite no data having been loaded.
The code breaks where it is shown in the pic.
How can I have the code keep running before the time out will appear?
Let's say I would like the code to keep executing if after 90 seconds the data cannot be loaded.
Why don't you try to change the refresh period?
You can also try to look at the code generated by Power Query by Unchecking the "Enable Background Refresh" in the Data -> Connection -> Properties.
You can also add a timeout of your choice. You can add this bit after you defined the URL
, [Timeout = #duration (X,Y,Z,N)]
Where X is Days, Y is Hours, Z is Minutes, N is Seconds
Else if you are really interested on killing the Web Query after the default 100 seconds, then before starting the code you can put this line
On Error Resume Next

Is there anyway to stop implicity wait during try/except?

I have a selenium script that automates signing up on a website. During the process, I have driver.implicity_wait(60) BUT there is a segment of code where I have a try/except statement where it tries to click something but if it can't be found, it continues. The issue is that if the element isn't there to be clicked, it waits 60 seconds before doing the except part of code. Is there anyway I can have it not wait the 60 seconds before doing the except part? Here is my code:
if PROXYSTATUS==False:
driver.find_element_by_css_selector("img[title='中国大陆']").click()
else:
try:
driver.find_element_by_css_selector("img[title='中国大陆']").click()
except:
pass
In other words if a proxy is used, a pop up will occasionally display, but sometimes it won't. That's why I need the try/except.
You can use set_page_load_timeout to change the default timeout to a lower value that suits you.
You will still need to wait for some amount of time, otherwise you might simply never click on the element you are looking for, because your script will be faster than the page load.
In the try block u can lower the timeout say 10 by using driver.implicity_wait(10) or even to 0. Place this before the find element statement in the try block. Add a finally block and set this back to 60 driver.implicity_wait(60).

Applescript - Apple Event Timed out

I'm trying to open a very large excel file (*.xls) using applescript. The code is very simple, it looks like it is working, however after a few minutes I receive the following message:
Result:
error "Microsoft Excel got an error: AppleEvent timed out." number -1712
Any idea about how to solve it? BTW using the automator doesn't work either.
Here's my code
tell application "Microsoft Excel"
activate
open "/Users/sergioguerra1/Desktop/Detektor/Etapa II/Reporte General.xls"
delay 300
end tell
Try wrapping the open command in a with timeout block.
eg.
tell application "Microsoft Excel"
activate
with timeout of 3600 seconds
open "/Users/sergioguerra1/Desktop/Detektor/Etapa II/Reporte General.xls"
end timeout
end tell
This will override Applescripts default timeout of 2 mins, giving it longer to finish executing that command.
More info here in the AppleScript docs.
Conversely, if you are wanting to open your excel file/s without having to wait 2 minutes or longer ( eg 3600 secs) for a timeout to occur, then you may prefer to delibertely trigger the timeout sooner, and catch the error with "try" block.
I've found this problem occurs when I use the "linked tables" feature in excel, and the linked table is no longer accessable. Excel pops up a nasty dialog 1/2 way through the "open" command and just hangs till you type ESC twice ( or similar ) eg:
try
with timeout of 10 seconds
open some_excel_File
end timeout
on error -- excel timeout probably due to linked tables
-- if the file has "linked tables" we need to hit esc twice after opening it.
tell application "System Events"
repeat 2 times
key code 53
delay 3
end repeat
end tell
end try

Cron Job run between a range of minutes?

Ok so I see all these questions about crons running every 10 minutes, every whatever minutes, but it's SET. However, I want mine to RANDOMLY run every 10-15 minutes. Between those ranges so sometimes it'll pick 13 min, 12 min, etc.
Is there any possible way you can do that on a shared server? Or would I have to program it via PHP and have that script run everytime someone visits the page I want to refresh?
Thanks for your input!
I usually let the cronjob run every 10min and in the script itself that cron executed I place at the header of the script a sleep+random number of the extra random "window" I need - i.e: in your case 5min => 300sec
Something like:
#!/usr/bin/php
<?
sleep( rand(1, 300) );
print "starting job ! \n"
?>

Resources