Continue After Exception - Python - python-3.x

I am trying to have try and except block in my code to catch an error, then put it to sleep for 5 seconds and then I want to continue where it left off. Following is my code and currently as soon as it catches exception, it does not continue and stops after exception.
from botocore.exceptions import ClientError
tries = 0
try:
for pag_num, page in enumerate(one_submitted_jobs):
if 'NextToken' in page:
print("Token:",pag_num)
else:
print("No Token in page:", pag_num)
except ClientError as exception_obj:
if exception_obj.response['Error']['Code'] == 'ThrottlingException':
print("Throttling Exception Occured.")
print("Retrying.....")
print("Attempt No.: " + str(tries))
time.sleep(5)
tries +=1
else:
raise
How can I make it to continue after exception? Any help would be great.
Note - I am trying to catch AWS's ThrottlingException error in my code.
Following code is for demonstration to #Selcuk to show what I have currently from his answer. Following will be deleted as soon as we agree if I am doing it correct or not.
tries = 1
pag_num = 0
# Only needed if one_submitted_jobs is not an iterator:
one_submitted_jobs = iter(one_submitted_jobs)
while True:
try:
page = next(one_submitted_jobs)
# do things
if 'NextToken' in page:
print("Token: ", pag_num)
else:
print("No Token in page:", pag_num)
pag_num += 1
except StopIteration:
break
except ClientError as exception_obj:
# Sleep if we are being throttled
if exception_obj.response['Error']['Code'] == 'ThrottlingException':
print("Throttling Exception Occured.")
print("Retrying.....")
print("Attempt No.: " + str(tries))
time.sleep(3)
tries +=1

You are not able to keep running because the exception occurs in your for line. This is a bit tricky because in this case the for statement has no way of knowing if there are more items to process or not.
A workaround could be to use a while loop instead:
pag_num = 0
# Only needed if one_submitted_jobs is not an iterator:
one_submitted_jobs = iter(one_submitted_jobs)
while True:
try:
page = next(one_submitted_jobs)
# do things
pag_num += 1
except StopIteration:
break
except ClientError as exception_obj:
# Sleep if we are being throttled

Related

How to make selenium threads run (each thread with its own driver)

I have a python 3 script that needs to make thousands of requests to multiple different websites and check if their source code pass in some pre defined rules.
I am using selenium to make the requests because I need to get the source code after JS finishes it excecution, but due to the high number of urls I need to check, I am trying to make it run multiple threads simultaneously. Each thread creates and maintain an instance of webdriver to make the requests. The problem is after a while all threads go silent and simply stop executing, leaving just a single thread doing all the work. Here is the relevant part of my code:
def get_browser(use_firefox = True):
if use_firefox:
options = FirefoxOptions()
options.headless = True
browser = webdriver.Firefox(options = options)
browser.implicitly_wait(4)
return browser
else:
chrome_options = ChromeOptions()
chrome_options.add_argument("--headless")
browser = webdriver.Chrome(chrome_options=chrome_options)
browser.implicitly_wait(4)
return browser
def start_validations(urls, rules, results, thread_id):
try:
log("thread %s started" % thread_id, thread_id)
browser = get_browser(thread_id % 2 == 1)
while not urls.empty():
url = "http://%s" % urls.get()
try:
log("starting %s" % url, thread_id)
browser.get(url)
time.sleep(0.5)
WebDriverWait(browser, 6).until(selenium_wait_reload(4))
html = browser.page_source
result = check_url(html, rules)
original_domain = url.split("://")[1].split("/")[0].replace("www.","")
tested_domain = browser.current_url.split("://")[1].split("/")[0].replace("www.","")
redirected_url = "" if tested_domain == original_domain else browser.current_url
results.append({"Category":result, "URL":url, "Redirected":redirected_url})
log("finished %s" % url, thread_id)
except Exception as e:
log("couldn't test url %s" % url, thread_id )
log(str(e), thread_id)
results.append({"Category":"Connection Error", "URL":url, "Redirected":""})
browser.quit()
time.sleep(2)
browser = get_browser(thread_id % 2 == 1)
except Exception as e:
log(str(e), thread_id)
finally:
log("closing thread", thread_id)
browser.quit()
def calculate_progress(urls):
progress_folder ="%sprogress/" % WEBROOT
if not os.path.exists(progress_folder):
os.makedirs(progress_folder)
initial_size = urls.qsize()
while not urls.empty():
current_size = urls.qsize()
on_queue = initial_size - current_size
progress = '{0:.0f}'.format((on_queue / initial_size * 100))
for progress_file in os.listdir(progress_folder):
file_path = os.path.join(progress_folder, progress_file)
if os.path.isfile(file_path) and not file_path.endswith(".csv"):
os.unlink(file_path)
os.mknod("%s%s" % (progress_folder, progress))
time.sleep(1)
if __name__ == '__main__':
while True:
try:
log("scraper started")
if os.path.isfile(OUTPUT_FILE):
os.unlink(OUTPUT_FILE)
manager = Manager()
rules = fetch_rules()
urls = manager.Queue()
fetch_urls()
results = manager.list()
jobs = []
p = Process(target=calculate_progress, args=(urls,))
jobs.append(p)
p.start()
for i in range(THREAD_POOL_SIZE):
log("spawning thread with id %s" % i)
p = Process(target=start_validations, args=(urls, rules, results, i))
jobs.append(p)
p.start()
time.sleep(2)
for j in jobs:
j.join()
save_results(results, OUTPUT_FILE)
log("scraper finished")
except Exception as e:
log(str(e))
As you can see, first I thought I could only have one instance of the browser, so I tried to run at least firefox and chrome in paralel, but this still leaves only a thread to do all the work.
Some times the driver crahsed and the thread stopped working even though it is inside a try/catch block, so I started to get a new instance of the browser everytime this happens,but it still didn't work. I also tried waiting a few seconds between creating each instance of the driver still with no results
here is a pastebin of one of the log files:
https://pastebin.com/TsjZdRYf
A strange thing that I noticed is that almost everytime the only thread that keeps running is the last one spawned (with id 3).
Thanks for your time and you help!
EDIT:
[1] Here is the full code: https://pastebin.com/fvVPwPVb
[2] custom selenium wait condition: https://pastebin.com/Zi7nbNFk
Am I allowed to curse on SO? I solved the problem, and I don't think this answer should exist on SO because nobody else will benefit from it. The problem was a custom wait condition that I had created. This class is in the pastebin that was added in edit 2, but I'll also add it here for convenince:
import time
class selenium_wait_reload:
def __init__(self, desired_repeating_sources):
self.desired_repeating_sources = desired_repeating_sources
self.repeated_pages = 0
self.previous_source = None
def __call__(self, driver):
while True:
current_source = driver.page_source
if current_source == self.previous_source:
self.repeated_pages = self.repeated_pages +1
if self.repeated_pages >= self.desired_repeating_sources:
return True
else:
self.previous_source = current_source
self.repeated_pages = 0
time.sleep(0.3)
The goal of this class was to make selenium wait because the JS could be loading additional DOM.
So, this class makes selenium wait a short time and check the code, wait a little again and check the code again. The class repeats this until the source code is the same 3 times in a row.
The problem is that there are some pages that have a js carrousel, so the source code is never the same. I thought that in cases like this the WebDriverWait second parameter would make it crash with a timeoutexception. I was wrong.

How to make requests keep trying to connect to url regardless of exception from where it left off in the list?

I have a list of IDs that I am passing into a URL within a for loop:
L = [1,2,3]
lst=[]
for i in L:
url = 'URL.Id={}'.format(i)
xml_data1 = requests.get(url).text
lst.append(xml_data1)
time.sleep(1)
print(xml_data1)
I am trying to create a try/catch where regardless of the error, the request library keeps trying to connect to the URL from the ID it left off on from the list (L), how would I do this?
I setup this try/catch from this answer (Correct way to try/except using Python requests module?)
However this forces the system to exit.
try:
for i in L:
url = 'URL.Id={}'.format(i)
xml_data1 = requests.get(url).text
lst.append(xml_data1)
time.sleep(1)
print(xml_data1)
except requests.exceptions.RequestException as e:
print (e)
sys.exit(1)
You can put the try-except block in a loop and only break the loop when the request does not raise an exception:
L = [1,2,3]
lst=[]
for i in L:
url = 'URL.Id={}'.format(i)
while True:
try:
xml_data1 = requests.get(url).text
break
except requests.exceptions.RequestException as e:
print(e)
lst.append(xml_data1)
time.sleep(1)
print(xml_data1)

How to find why thread is suspended when using multiprocessing or bypass that?

I use feedparser to get rss feeds from some sites, my core code is like this:
def parseworker(procnum, result_queue, return_dict, source_link):
try:
data = feedparser.parse(source_link)
return_dict[procnum] = data
except Exception as e:
print(str(e))
result_queue.put(source_link + 'grabbed')
def infoworker(procnum, timeout, result_queue, source_name, source_link):
text = 'recheck ' + source_name + ': ' + '...'
progress = ''
for x in range(timeout):
progress += '.'
sys.stdout.write('\r' + text + progress)
sys.stdout.flush()
time.sleep(1)
result_queue.put('time out')
def parsecaller(link, timeout, timestocheck):
return_dict = multiprocessing.Manager().dict()
result_queue = multiprocessing.Queue()
counter = 1
jobs = []
result = []
while not (counter > timestocheck):
p1 = multiprocessing.Process(target=infoworker, args=(11, timeout, result_queue, source_name, link))
p2 = multiprocessing.Process(target=parseworker, args=(22, result_queue, return_dict, link))
jobs.append(p1)
jobs.append(p2)
p1.start()
p2.start()
result_queue.get()
p1.terminate()
p2.terminate()
p1.join()
p2.join()
result = return_dict.values()
if not result or result[0].bozo:
print(' bad - no data', flush=True)
result = -1
else:
print(' ok ', flush=True)
result = result[0]
break
counter += 1
if result == -1:
raise bot_exceptions.ParserExceptionData()
elif result == -2:
raise bot_exceptions.ParserExceptionConnection()
else:
return result
if __name__ == '__main__':
multiprocessing.freeze_support()
multiprocessing.set_start_method('spawn')
try:
data = parsecaller(source_link, timeout=wait_time, timestocheck=check_times)
except Exception as e:
print(str(e))
continue
It works good, but after some random time goes into suspended state and does nothing - like infinite bootloop. It may suspend after 4 hours or 3 days, that's random.
I try to solve that problem by multiprocessing: use main process with timer like infoworker. When infoworker stops, it will put "result" to queue and by that will call result_queue.get() in parsecaller which after continues it and terminates both processes. But it does not work. Today, after 11 hours I got my code in suspended state in multiprocessing managers.py:
def serve_forever(self):
'''
Run the server forever
'''
self.stop_event = threading.Event()
process.current_process()._manager_server = self
try:
accepter = threading.Thread(target=self.accepter)
accepter.daemon = True
accepter.start()
try:
while not self.stop_event.is_set():
self.stop_event.wait(1)
except (KeyboardInterrupt, SystemExit):
pass
finally:
if sys.stdout != sys.__stdout__: # what about stderr?
util.debug('resetting stdout, stderr')
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
sys.exit(0)
for all time it was in:
while not self.stop_event.is_set():
self.stop_event.wait(1)
I thing that somewhere or GIL does not allow any other threads to work in processes or feedparser goes into loop. And of course it gets suspended with any random RSS sources.
My 'environment':
Mac OS 10.12.6 (also was that situation on win7 and win 10)
Python 3.7.0 (also wat that situation on 3.6.2, 3.6.5)
Pycharm 2017.2.2
My questions:
How to understand why it gets suspended (what to do, any recipe)?
How to bypass that state (what to do, any recipe)?

How to handle ServiceRequestError in Python

I am looking at Face Detection, using Kairos API while working on this program with the following code
def Test():
image = cap.read()[1]
cv2.imwrite("opencv_frame.png",image)
recognized_faces = kairos_face.verify_face(file="filepath/opencv_frame.png", subject_id='David',gallery_name='Test')
print(recognized_faces)
if recognized_faces.get('images')[0].get('transaction').get('status') !='success':
print('No')
else:
print('Hello', recognized_faces.get('images')[0].get('transaction').get('subject_id'))
this works fine if i look straight at the camera, but if i turn my head it breaks with the following response.
kairos_face.exceptions.ServiceRequestError: {'Errors': [{'Message': 'no faces found in the image', 'ErrCode': 5002}]}
How can i handle the exception Error, and force the test function to keep running until a face is detected.
Can't you just catch the exception and try again?
def Test():
captured = False
while not captured:
try:
image = cap.read()[1]
cv2.imwrite("opencv_frame.png",image)
recognized_faces = kairos_face.verify_face(file="filepath/opencv_frame.png", subject_id='David',gallery_name='Test')
captured = True
except kairos_face.exceptions.ServiceRequestError:
pass # optionally wait
print(recognized_faces)
if recognized_faces.get('images')[0].get('transaction').get('status') !='success':
print('No')
else:
print('Hello', recognized_faces.get('images')[0].get('transaction').get('subject_id'))

Unable to catch TweepError exception

I tried to catch TweepError exception in while - try - except loop but unsuccessful. The following code keeps stop running when TweepError/ RateLimitError occuring.
import tweepy
import time
name_set = ('name1','name2','name3')
result = []
for screen_name in name_set:
while True:
profile = api.get_user(screen_name = screen_name)
try:
print('collecting user %s'%screen_name)
result.append(profile)
break
except tweepy.RateLimitError:
print('sleep 15 minutes')
sleep(900)
continue
except tweepy.TweepError as e:
print(e)
print('Account %s'%screen_name)
break
else:
print('Account %s'%screen_name)
break
TweepError
TweepError: [{'message': 'User not found.', 'code': 50}]
You should put a API call statement in the try block to catch a exception:
try:
profile = api.get_user(screen_name = screen_name)
print('collecting user %s'%screen_name)
...

Resources